There is an abundance of publically available taxonomic information, including:
However, these data sources tend to have different ways of specifying taxonomic classifications, including:
In addition, the same method of defining classifications could be encoded with different formats. For example, all these formats are from refernece databases that supply classifications using a series of taxon names:
S001191995 uncultured archaeon; LCDARCH35 Lineage=Root;rootrank;Archaea;domain;"Euryarchaeota";phylum;"Methanomicrobia";class;Methanosarcinales;order;untaxmap_Methanosarcinales;
AB001438.1.1662 Eukaryota;SAR;Alveolata;Dinoflagellata;Dinophyceae;Gymnodiniphycidae;Kareniaceae;Gyrodinium sp.
Lachnum_sp|JQ347180|SH189775.06FU|reps|k__Fungi;p__Ascomycota;c__Leotiomycetes;o__Helotiales;f__Hyaloscyphaceae;g__Lachnum;s__Lachnum_sp
All this diversity can make parsing data into a common form difficult.
extract_taxonomyRather than exacerbating the syntactic pollution with another custom format or creating a parser for every database-specific format, the function extract_taxonomy parses taxonomy information from arbitrary characters strings (e.g. sequence headers) identified by a regex expression. Although the motivation for creating extract_taxonomy is to parse FASTA sequence headers, the function is not specific to FASTA or even to sequence information, so I will use the word “observation” instead of “sequence header” from now on.
extract_taxonomy can get taxonomic classfiacitons from any of the previously mentioned format types:
For all but a full classifcaiton with taxon names (e.g. Bacteria;Proteobacteria;Gammaproteobacteria; Enterobacterales;Enterobacteriaceae;Escherichia;coli), extract_taxonomy will query online databases to get the classificaiton. Note that this usually invloves quierying NCBI, which is often quite slow, so it it best to use data that has a full classifcaiton with taxon names when possible. At the minimum, the output of extract_taxonomy consists of unique taxon identifiers, and the tree structure of the taxonomy shared by all sequences.
NOTE:
extract_taxonomyhas been designed principally for parsing character vectors, like FASTA headers, rather than tabular data, like the abundance matricies produced by QIIME and mothur. There is another function in development calledparse_taxonomy_tableintended to parse this kind of data, but it is not quite mature yet, so it will not be discussed for now. See?parse_taxonomy_tablefor more information.
There are three important arguments that will usually be relevant: regex, key, and database.
regex argument defines the structure of the strings and the location of relevant information (e.g. GenBank accession numbers) via regex capture groups (i.e. pairs of unescaped parentheses).key argument defines what kind of information is in each capture group, determining how it will be parsed. The elements of key has a defined set of possible values; see the extract_taxonomy help page for options.database argument determines the online taxonomy database that will be queried when necessary. Usually, this is ncbi, but others databases are possible (although not as well tested yet); see the extract_taxonomy help page for options.Other arguments are important in special cases. Some will be explained in the examples below, but for more details see ?extract_taxonomy.
An object of type taxmap is returned. See ?taxmap for more details.
FASTA files are one of the most common formats that sometimes contain taxonomic data. The read.FASTA function from the ape (Paradis, Claude, and Strimmer 2004) package is commonly used to parse FASTA files in R and extract_taxonomy recognizes its output. When the output of ape::read.FASTA is given to extract_taxonomy, the headers are parsed and the sequences are saved in the output object.
FASTA files downloaded from GenBank custom queries contain the GenBank id and the accession number/version to identify sequences. Taxonomic information can be retrieved using either of these identifiers. The following shows how to extract the GI numbers, accession id, and description from the headers. The GenBank accession number is being used to look up the taxonomy information (hence "obs_id" in key option), while the GI numbers and description are being returned without contributing taxonomic information (hence "obs_info" in key option).
library(metacoder)
file_path <- system.file("extdata", "ncbi_basidiomycetes.fasta", package = "metacoder")
sequences <- ape::read.FASTA(file_path)
genbank_ex_data <- extract_taxonomy(sequences,
regex = "^.*\\|(.*)\\|.*\\|(.*)\\|(.*)$",
key = c(gi_no = "obs_info", "obs_id", desc = "obs_info"),
database = "ncbi")
The sequence headers have the format:
[1] "gi|626494018|ref|NR_119774.1| Cortinarius cramesinus PDD 27173 ITS region; from TYPE material"
[2] "gi|1028916741|ref|NR_137130.1| Lactarius perparvus GENT KW320 ITS region; from TYPE material"
[3] "gi|1028916731|ref|NR_137120.1| Cortinarius elotoides IB 19870060 ITS region; from TYPE material"
[4] "gi|1028916725|ref|NR_137114.1| Stomatisora psychotriicola PREM 60886 ITS region; from TYPE material"
[5] "gi|1028916658|ref|NR_137047.1| Fomitiporia tenuis MUCL 44802 ITS region; from TYPE material"
[6] "gi|1028916619|ref|NR_137008.1| Derxomyces qinlingensis CGMCC AS 2.2446 ITS region; from TYPE material"
We can plot the result using plot:
set.seed(2)
heat_tree(genbank_ex_data,
node_size = n_obs,
node_color = n_obs,
node_label = name,
layout = "fruchterman")
The format of the UNITE (Kõljalg et al. 2005) FASTA release has two pieces of information from which classifications can be determined. The GenBank sequence identifier can be used to look up the taxon id from GenBank. Alternatively, the classifications specified in the header can be used to make an arbitrarily coded taxonomy.
The GenBank accession number in the second entry of UNITE sequence headers can be used to look up the taxon assigned to each sequence by GenBank. However, some of the sequences in the UNITE database do not have a GenBank accession number; these IDs start with UDB and should be filtered out:
file_path <- system.file("extdata", "unite_general_release.fasta", package = "metacoder")
sequences <- ape::read.FASTA(file_path)
unite_ex_data_1 <- extract_taxonomy(sequences[!grepl(pattern = "\\|UDB", names(sequences))],
regex = "^(.*)\\|(.*)\\|(.*)\\|.*\\|(.*)$",
key = c(seq_name = "obs_info", "obs_id",
other_id = "obs_info", tax_string = "obs_info"),
database = "ncbi")
The sequence headers have the format:
[1] "Lachnum_sp|JQ347180|SH189775.06FU|reps|k__Fungi;p__Ascomycota;c__Leotiomycetes;o__Helotiales;f__Hyaloscyphaceae;g__Lachnum;s__Lachnum_sp"
[2] "Lachnellula_calyciformis|U59145|SH189776.06FU|refs|k__Fungi;p__Ascomycota;c__Leotiomycetes;o__Helotiales;f__Hyaloscyphaceae;g__Lachnellula;s__Lachnellula_calyciformis"
[3] "Lachnum_sp|AM084756|SH189777.06FU|reps|k__Fungi;p__Ascomycota;c__Leotiomycetes;o__Helotiales;f__Hyaloscyphaceae;g__Lachnum;s__Lachnum_sp"
[4] "Lachnum_sp|FM172814|SH189778.06FU|reps|k__Fungi;p__Ascomycota;c__Leotiomycetes;o__Helotiales;f__Hyaloscyphaceae;g__Lachnum;s__Lachnum_sp"
[5] "Lachnum_sp|FN539058|SH189779.06FU|reps|k__Fungi;p__Ascomycota;c__Leotiomycetes;o__Helotiales;f__Hyaloscyphaceae;g__Lachnum;s__Lachnum_sp"
[6] "Lachnum_pulverulentum|AB481260|SH189780.06FU|refs|k__Fungi;p__Ascomycota;c__Leotiomycetes;o__Helotiales;f__Hyaloscyphaceae;g__Lachnum;s__Lachnum_pulverulentum"
set.seed(10)
filter_taxa(unite_ex_data_1, name == "Fungi", subtaxa = TRUE) %>%
heat_tree(node_size = n_obs,
node_color = n_obs,
node_label = name,
layout = "davidson-harel",
overlap_avoidance = 0.5)
If you want to use the structure and names of the classification provided in the header, but still look up the official taxon id, you can provide "class" to key and specify a database to look up taxon ids from. Since the UNITE classification also included the rank for each taxon, we will have to specify how to parse each taxon string using the class_regex and class_key options. These work the same as regex and key, except they apply to each element in a classification. You can still capture the sequence id or taxon id (assuming its present in the header) by using "obs_info" or "taxon_info" where you would otherwise use "obs_id" or "taxon_id".
file_path <- system.file("extdata", "unite_general_release.fasta", package = "metacoder")
sequences <- ape::read.FASTA(file_path)
unite_ex_data_2 <- extract_taxonomy(sequences,
regex = "^(.*)\\|(.*)\\|(.*)\\|.*\\|(.*)$",
key = c(seq_name = "obs_info", seq_id = "obs_info",
other_id = "obs_info", "class"),
class_regex = "^(.*)__(.*)$",
class_key = c(unite_rank = "taxon_info", "name"),
class_sep = ";",
database = "ncbi")
heat_tree(unite_ex_data_2,
node_size = n_obs,
node_color = n_obs,
node_label = name)
Note that the taxon name (entry 1) and the sequence id (entry 2) are now encoded as "obs_info", causing them to be interpreted as generic data. This means that only the classification string (e.g. k__Fungi;p__Ascomycota;c__Leotiomycetes) will be interpreted as having taxonomic information, but the other information will also be included in the output in columns named name and sequence_id. The unique taxon id for each taxon encountered will be looked up and taxa not found will be encoded as NA.
It is also possible to build a custom taxonomy encoding using the taxonomy in the sequence headers without looking up the official taxon ids of each taxon. Taxa will be assigned arbitrary taxon ids that will be specific to the current analysis. This is prefereable to just using unique taxon names as IDs, since there are often placeholder taxa names such as “unknown” or “spp” and occasionally different taxa can have to same name. This is the method that is most useful if available, since it does not require an internet connection. To do this, leave off the database option from the previous example.
file_path <- system.file("extdata", "unite_general_release.fasta", package = "metacoder")
sequences <- ape::read.FASTA(file_path)
unite_ex_data_3 <- extract_taxonomy(sequences,
regex = "^(.*)\\|(.*)\\|(.*)\\|.*\\|(.*)$",
key = c(seq_name = "obs_info", seq_id = "obs_info",
other_id = "obs_info", "class"),
class_regex = "^(.*)__(.*)$",
class_key = c(unite_rank = "taxon_info", "name"),
class_sep = ";")
heat_tree(unite_ex_data_3,
node_size = n_obs,
node_color = n_obs,
node_label = name)
The ITS1 database FASTA header includes the GenBank taxon id. In the example below, both the "obs_id" and "taxon_id" keys are provided, but only the "taxon_id" is used to look up taxonomy information since it has precedence.
file_path <- system.file("extdata", "its1_chytridiomycota_hmm.fasta", package = "metacoder")
sequences <- ape::read.FASTA(file_path)
its1_ex_data <- extract_taxonomy(sequences,
regex = "^(.*)\\|(.*)\\|tax_id:(.*)\\|(.*)$",
key = c("obs_id", taxon_name = "taxon_info",
"taxon_id", description = "obs_info"),
database = "ncbi")
The sequence headers have the format:
[1] "HQ191393_ITS1_HMM|uncultured Chytridiomycota|tax_id:175247|ITS1 located by HMM profiles,179bp length; "
[2] "HQ191391_ITS1_HMM|uncultured Chytridiomycota|tax_id:175247|ITS1 located by HMM profiles,251bp length; "
[3] "KJ464412_ITS1_HMM|Rhizophydium sphaerotheca|tax_id:109902|ITS1 located by HMM profiles,145bp length; "
[4] "HQ191291_ITS1_HMM|uncultured Chytridiomycota|tax_id:175247|ITS1 located by HMM profiles,171bp length; "
[5] "HQ191292_ITS1_HMM|uncultured Chytridiomycota|tax_id:175247|ITS1 located by HMM profiles,178bp length; "
[6] "HQ191295_ITS1_HMM|uncultured Chytridiomycota|tax_id:175247|ITS1 located by HMM profiles,177bp length; "
set.seed(1)
heat_tree(its1_ex_data,
node_size = n_obs,
node_color = n_obs,
node_label = name,
layout = "fruchterman-reingold")
The first observation in the PR2 (Guillou et al. 2012) header is sometimes a GenBank accession number, but not always. Therefore, the best option is to use the included taxonomy information.
file_path <- system.file("extdata", "pr2_stramenopiles_gb203.fasta", package = "metacoder")
sequences <- ape::read.FASTA(file_path)
pr2_ex_data <- extract_taxonomy(sequences,
regex = "^(.*\\..*?)\\|(.*)$",
key = c("obs_id", "class"),
class_sep = "\\|")
The sequence headers have the format:
[1] "10-044.1.1773|Eukaryota|Stramenopiles|Stramenopiles_X|Oomycota|Oomycota_X|Oomycota_XX|Oomycota_XXX|Oomycota_XXX+sp."
[2] "10-045.1.1772|Eukaryota|Stramenopiles|Stramenopiles_X|Oomycota|Oomycota_X|Oomycota_XX|Oomycota_XXX|Oomycota_XXX+sp."
[3] "10-150.1.1771|Eukaryota|Stramenopiles|Stramenopiles_X|Oomycota|Oomycota_X|Oomycota_XX|Oomycota_XXX|Oomycota_XXX+sp."
[4] "10-374.1.1778|Eukaryota|Stramenopiles|Stramenopiles_X|Oomycota|Oomycota_X|Oomycota_XX|Oomycota_XXX|Oomycota_XXX+sp."
[5] "HQ166719.1.1035_U|Eukaryota|Stramenopiles|Ochrophyta|Bacillariophyta|Bacillariophyta_X|Polar-centric-Mediophyceae|Chaetoceros|Chaetoceros+calcitrans"
[6] "JQ781961.1.1805_U|Eukaryota|Stramenopiles|Stramenopiles_X|MAST|MAST-4|MAST-4A|MAST-4A_X|MAST-4A_X+sp."
heat_tree(pr2_ex_data,
node_size = n_obs,
node_color = n_obs,
node_label = name)
This RDP (Maidak et al. 2001) FASTA file does not contain any references to taxon or sequence ids that could be used to look up more information. Instead, we will parse the taxonomy information included in the sequence headers. In this case both the rank and taxon name are supplied.
file_path <- system.file("extdata", "rdp_current_Archaea_unaligned.fa", package = "metacoder")
sequences <- ape::read.FASTA(file_path)
rdp_ex_data <- extract_taxonomy(sequences,
regex = "^(.*?) (.*)\\tLineage=(.*)",
key = c(id = "obs_info", description = "obs_info", "class"),
class_regex = "(.+?);(.*?);",
class_key = c("name", "taxon_info"))
[1] "S001191995 uncultured archaeon; LCDARCH35\tLineage=Root;rootrank;Archaea;domain;\"Euryarchaeota\";phylum;\"Methanomicrobia\";class;Methanosarcinales;order;untaxmap_Methanosarcinales;"
[2] "S002342293 uncultured archaeon; Arch-2\tLineage=Root;rootrank;Archaea;domain;\"Euryarchaeota\";phylum;untaxmap_\"Euryarchaeota\";"
[3] "S002876736 uncultured archaeon; AydatHIV_14\tLineage=Root;rootrank;Archaea;domain;\"Thaumarchaeota\";phylum;Nitrosopumilales;order;Nitrosopumilaceae;family;Nitrosopumilus;genus"
[4] "S002934720 uncultured archaeon; A0423R002_A13\tLineage=Root;rootrank;Archaea;domain;\"Euryarchaeota\";phylum;\"Methanomicrobia\";class;untaxmap_\"Methanomicrobia\";"
[5] "S003785628 uncultured archaeon; PC500-1A66\tLineage=Root;rootrank;Archaea;domain;\"Crenarchaeota\";phylum;Thermoprotei;class;untaxmap_Thermoprotei;"
[6] "S000511720 uncultured crenarchaeote; I3K-F4\tLineage=Root;rootrank;Archaea;domain;\"Thaumarchaeota\";phylum;Nitrosopumilales;order;Nitrosopumilaceae;family;Nitrosopumilus;genus"
Note that the same character ( ; ) is used to delineate different ranks and taxa. This makes it the class_sep option inappropriate. If the class_sep option is not defined, each match of class_regex is considered to be a different taxa. Therefore, we can use a class_regex that matches two ; to correctly parse this awkward format.
set.seed(3)
heat_tree(rdp_ex_data,
node_size = n_obs,
node_color = n_obs,
node_label = name,
layout = "da")
SILVA (Quast et al. 2012) has a relativly simple format with an embedded taxonomy:
file_path <- system.file("extdata", "silva_nr99.fasta", package = "metacoder")
sequences <- ape::read.FASTA(file_path)
silva_ex_data <- extract_taxonomy(sequences,
regex = "^(.*?) (.*)$",
key = c(id = "obs_info", "class"),
class_sep = ";")
[1] "AB001438.1.1662 Eukaryota;SAR;Alveolata;Dinoflagellata;Dinophyceae;Gymnodiniphycidae;Kareniaceae;Gyrodinium sp."
[2] "AB006051.1.1796 Eukaryota;Archaeplastida;Chloroplastida;Chlorophyta;Trebouxiophyceae;Chlorellales;Lobosphaera;Lobosphaera tirolensis"
[3] "FJ911802.1.1606 Eukaryota;Opisthokonta;Holozoa;Metazoa (Animalia);Eumetazoa;Bilateria;Arthropoda;Chelicerata;Arachnida;Acari;Poecilochirus sp. APGD-2010"
[4] "FJ911814.1.1608 Eukaryota;Opisthokonta;Holozoa;Metazoa (Animalia);Eumetazoa;Bilateria;Arthropoda;Chelicerata;Arachnida;Acari;Arrenoseius sp. APGD-2010"
[5] "FJ911819.1.1607 Eukaryota;Opisthokonta;Holozoa;Metazoa (Animalia);Eumetazoa;Bilateria;Arthropoda;Chelicerata;Arachnida;Acari;Eviphis sp. 1 APGD-2010"
[6] "FJ911835.1.1585 Eukaryota;Opisthokonta;Holozoa;Metazoa (Animalia);Eumetazoa;Bilateria;Arthropoda;Chelicerata;Arachnida;Acari;Dermanyssus hirundinis"
set.seed(2)
heat_tree(silva_ex_data,
node_size = n_obs,
node_color = n_obs,
node_label = name,
tree_label = name)
If the input is only the data used to look up classifications, without any other text to seperate from it, the regex argument can be omitted. Say someone emails you a list of ncbi ids that you want to plot. You could use the following code to get the taxonomic information.
raw <- "JQ086376.1 AM946981.2 JQ182735.1 CP001396.1 J02459.1 AC150248.3 X64334.1 CP001509.3 CP006698.1 AC198536.1 JF340119.2 KF771025.1 CP007136.1 CP007133.1 U39286.1 CP006584.1 EU421722.1 U03462.1 U03459.1 AC198467.1 V00638.1 CP007394.1 CP007392.1 HG941718.1 HG813083.1 HG813082.1 CP007391.1 HG813084.1 CP002516.1 KF561236.1 JX509734.1 AP010953.1 U39285.1 M15423.1 X98613.1 CP006784.1 CP007393.1 CU928163.2 AP009240.1 CP007025.1 CP006027.1 CP003301.1 CP003289.1 CP000946.1 CP002167.1 HG428755.1 JQ086370.1 CP001846.1 CP001925.1 X99439.1 AP010958.1 CP001368.1 AE014075.1 CP002212.1 CP003034.1 CP000243.1 AY940193.1 CP004009.1 JQ182732.1 U02453.1 AY927771.1 BA000007.2 CP003109.1 CP007390.1 U02426.1 U02425.1 CP006262.1 HG738867.1 U00096.3 FN554766.1 CP001855.1 L19898.1 AE005174.2 FJ188381.1 AK157373.1 JQ182733.1 U39284.1 U37692.1 AF129072.1 FM180568.1 CP001969.1 HE616528.1 CP002729.1 JF974339.1 AB248924.1 AB248923.1 CP002291.1 X98409.1 CU928161.2 CP003297.1 FJ797950.1 CP000038.1 U82598.1 CP002211.1 JQ806764.1 U03463.1 CP001665.1"
ids <- strsplit(raw, " ")[[1]]
contaminants <- extract_taxonomy(ids, key = "obs_id", database = "ncbi")
set.seed(3)
heat_tree(contaminants,
node_size = n_obs,
node_color = n_obs,
node_label = name,
tree_label = name,
layout = "fruchterman-reingold")
Any list of taxon names can also be parsed by leaving off the regex option. One way to get those names from a website that does not have an API is to parse the HTML of the website using and html parser and XPATH. Note that this example uses ITIS instead of NCBI to look up classifications for the taxa scraped from The Plant List:
library(XML)
taxon_names <- htmlTreeParse("http://www.theplantlist.org/1.1/browse/B/") %>%
xmlRoot() %>%
getNodeSet("//ul[@id='nametree']/li/a/i") %>% # The string is an XPATH expression
sapply(xmlValue)
bryophytes_ex_data <- extract_taxonomy(taxon_names, key = "name", database = "itis")
set.seed(2)
heat_tree(bryophytes_ex_data,
node_size = n_obs,
node_color = n_obs,
node_label = name,
tree_label = name,
layout = "davidson-harel")
sessionInfo()
## R version 3.3.1 (2016-06-21)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.2 LTS
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] grid stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] metacoder_0.1.2 knitcitations_1.0.7 knitr_1.14
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.9 formatR_1.4 plyr_1.8.4 bitops_1.0-6
## [5] tools_3.3.1 digest_0.6.12 gtable_0.2.0 lubridate_1.6.0
## [9] evaluate_0.10 tibble_1.2 nlme_3.1-128 lattice_0.20-34
## [13] bibtex_0.4.0 igraph_1.0.1 DBI_0.5-1 yaml_2.1.13
## [17] parallel_3.3.1 RefManageR_0.13.1 dplyr_0.5.0 httr_1.2.1
## [21] stringr_1.1.0 rprojroot_1.2 R6_2.2.0 XML_3.98-1.4
## [25] rmarkdown_1.3 RJSONIO_1.3-0 reshape2_1.4.2 ggplot2_2.2.1
## [29] magrittr_1.5 backports_1.0.5 scales_0.4.1 htmltools_0.3.5
## [33] assertthat_0.1 ape_4.0 colorspace_1.2-7 labeling_0.3
## [37] stringi_1.1.2 RCurl_1.95-4.8 lazyeval_0.2.0 munsell_0.4.3
Guillou, Laure, Dipankar Bachar, Stéphane Audic, David Bass, Cédric Berney, Lucie Bittner, Christophe Boutte, et al. 2012. “The Protist Ribosomal Reference Database (PR2): A Catalog of Unicellular Eukaryote Small Sub-Unit RRNA Sequences with Curated Taxonomy.” Nucleic Acids Research. Oxford Univ Press, gks1160.
Kõljalg, Urmas, Karl-Henrik Larsson, Kessy Abarenkov, R Henrik Nilsson, Ian J Alexander, Ursula Eberhardt, Susanne Erland, et al. 2005. “UNITE: A Database Providing Web-Based Methods for the Molecular Identification of Ectomycorrhizal Fungi.” New Phytologist 166 (3). Wiley Online Library: 1063–8.
Maidak, Bonnie L, James R Cole, Timothy G Lilburn, Charles T Parker Jr, Paul R Saxman, Ryan J Farris, George M Garrity, Gary J Olsen, Thomas M Schmidt, and James M Tiedje. 2001. “The RDP-II (Ribosomal Database Project).” Nucleic Acids Research 29 (1). Oxford Univ Press: 173–74.
Paradis, Emmanuel, Julien Claude, and Korbinian Strimmer. 2004. “APE: Analyses of Phylogenetics and Evolution in R Language.” Bioinformatics 20 (2). Oxford Univ Press: 289–90.
Quast, Christian, Elmar Pruesse, Pelin Yilmaz, Jan Gerken, Timmy Schweer, Pablo Yarza, Jörg Peplies, and Frank Oliver Glöckner. 2012. “The SILVA Ribosomal RNA Gene Database Project: Improved Data Processing and Web-Based Tools.” Nucleic Acids Research. Oxford Univ Press, gks1219.
Comments